Using Knowledge-Based Scores for Identifying Best Speech Recognition Hypothesis
نویسندگان
چکیده
The paper presents the evaluation of a knowledge-based scoring method applied to the problem of identifying the best speech recognition hypothesis (SRH) in a functioning multimodal dialogue system. The competing SRHs are evaluated in terms of their semantic coherence using the high-level domain knowledge encoded in the ontology. We conducted an annotation experiment and showed that humans can reliably select the best SRH in a given N-best list (agreement 95.35%). The knowledge-based method identifies correctly 88.07% of the best SRHs (given the baseline 63.91%), which is also an improvement over the automatic speech recognizer (ASR) (83.88% accuracy).
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملMining Call Center Conversations exhibiting Similar Affective States
Automatic detection and identifying emotions in large call center calls are essential to spot conversations that require further action. Most often statistical models generated using annotated emotional speech are used to design an emotion detection system. But annotation requires substantial amount of human intervention and cost; and may not be available for call center calls because of the in...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل